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Abstract 

The use and development of mobile interventions is experiencing rapid growth. In “just-in-time” mobile 
interventions, treatments are provided via a mobile device that are intended to help an individual make healthy 
decisions “in the moment,” and thus have a proximal, near future impact. Currently the development of mobile 
interventions is proceeding at a much faster pace than that of associated data science methods. A hrst step 
toward developing data-based methods is to provide an experimental design for use in testing the proximal 
effects of these just-in-time treatments. In this paper, we propose a “micro-randomized” trial design for this 
purpose. In a micro-randomized trial, treatments are sequentially randomized throughout the conduct of the 
study, with the result that each participant may be randomized at the 100s or 1000s of occasions at which a 
treatment might be provided. Further, we develop a test statistic for assessing the proximal effect of a treatment 
as well as an associated sample size calculator. We conduct simulation evaluations of the sample size calculator 
in various settings. Rules of thumb that might be used in designing the micro-randomized trial are discussed. 
This work is motivated by our collaboration on the HeartSteps mobile application designed to increase physical 
activity. 
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1 Introduction 

The use and development of mobile interventions is experiencing rapid growth. Mobile interventions are 
used across the health fields and include treatments used to improve HIV medication adherence |TT1[14|, to 
improve activity (El, accompany counseling/pharmacotherapy in substance use (4l[T8], reinforce abstinence in 
addictions (T]|2] and to support recovery from alcohol dependence (9l|2l] . Mobile interventions in maintaining 
adherence to anti-retroviral therapy and smoking cessation have shown sufficient effectiveness and replicability 
in trials and thus have been recommended for inclusion in health services (8) . 

However as Nilsen etal. (20) state “In fact, the development of mHealth technologies is currently progressing 
at a much faster pace than the science to evaluate their validity and efficacy, introducing the risk that ineffective 
or even potentially harmful or iatrogenic applications will be implemented.’Tndeed reviews, while reporting pre¬ 
liminary evidence of effectiveness, call for more programmatic, data-based approaches to constructing mobile 
interventions ElIIU. In particular these reviews call for research that focuses on data-informed development 
of these complex multi-component interventions prior to their evaluation in standard randomized controlled 
trials. But methods for using data to inform the design and evaluation of adaptive mobile interventions have 
lagged behind the use and deployment of these interventions fT3ll20ll^ . 

Many mobile interventions are designed to be “just-in-time" interventions, meaning that they intend to 
provide treatments that help an individual make healthy decisions in the moment, such as engaging in a 
desirable behavior (e.g., taking a medication on time) or effectively coping with a stressful situation. As such, 
mobile interventions are often intended to have proximal, near-term effects. A first approach toward developing 
data-based methods for evaluation of mobile health interventions is to provide an experimental design for use 
in testing the proximal effects of the treatments. This paper proposes a micro-randomized trial design for this 
purpose. In a micro-randomized trial, treatments are sequentially randomized throughout the conduct of the 
study, with the result that each participant may be randomized at the hundreds or thousands of occasions at 
which a treatment might be provided. This repeated randomization of treatments under investigation enables 
causal modeling of each treatment’s time-varying proximal effect as well as modeling of time-varying effect 
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moderation. Thus, the micro-randomized trial can be seen as a first experimental step in the development 
of effective mobile interventions that are composed of sequences of treatments. We propose to size the trial 
to detect the proximal main effect of the treatments. This is akin to the use of factorial designs for use in 
constructing multi-component interventions. In these factorial designs (3l[^, a first analysis often involves 
testing if the main effect of each treatment is equal to 0. 

This work is motivated by our collaboration on the HeartSteps mobile application for increasing physical 
activity, which we wiU use to illustrate our discussion. One of the treatments in HeartSteps is suggestions for 
physical activity which are tailored to the person’s current context. HeartSteps can deliver these suggestions 
at any of the five time intervals during the day, which correspond roughly to morning commute, mid-day, 
mid-afternoon, evening commute, and post-dinner times. When a suggestion is delivered, the user’s phone 
plays a notification sound, vibrates and lights up, and the suggestion is displayed on the lock screen of the phone. 
These suggestions encourage activity in the current context and are intended to have an effect (getting a person 
to walk) within the next hour. 

In the following section, we introduce the micro-randomized trial design. In section 3 we precisely define 
the proximal main effect of a treatment, using the language of potential outcomes. We develop the test statistic 
for assessing the proximal effect of a treatment as well as an associated sample size calculator in section 4 and 5. 
Next we provide simulation evaluation of the sample size calculator. We end, in Section 7, with a discussion. 


2 Micro-Randomized Trial 

In general an individuars longitudinal data, recorded via mobile devices that sense and provide treatments, can 
be written as 


{So,Si, Ai,S2,^2.---.Sf,^f, ...,Sr,^riSr+i} 

where, t indexes decision times. So is a vector of baseline information (gender, ethnicity, etc.) and St(f ^ 1) is 
information collected between time t - 1 and t (e.g. summary measures of recent activity levels, engagement, 
and burden; day of week; weather; busyness indicated by smartphone calendar, etc.). The treatment at time t is 
denoted by At', throughout this paper we consider binary options for the treatments (e.g., the treatment is on 
or off). The proximal response, denoted by Yt+\, is a known function of {St, At, Sf+i). Here we assume that the 
longitudinal data are independent and identically distributed across N individuals. Note that this assumption 
would be violated, if for example, some of the treatments are used to enhance social support between individuals 
in the study. 

In HeartSteps, data (Sf) is collected both passively via sensors and via participant self-report. Each participant 
is provided a “Jawbone” band O, worn at the wrist, which collects daily step count and the amount of sleep the 
user had the previous night. Furthermore sensors on the phone are used to collect a variety of information at 
each of the 5 time points during the day, including the time-stamp, location, busyness of planned activities on 
the phone calendar and other activity on the phone. Each evening, self-report data is collected including utility 
and burden ratings. The proximal response, Yt+i, for activity suggestions is the step count in the hour following 
time t. 

A decision time is a point in time at which—based on participant’s current state, past behavior, or current 
context—treatment may need to be delivered. Decision times vary by the nature of the intervention component. 
In HeartSteps, the decision times for activity suggestions are 5 times per day over the 42 day study duration. 
For an alcohol-recovery application that provides an intervention when an individual goes within 10 feet of a 
high risk location (e.g. a liquor store), decision points might be every 2 minutes, the frequency at which the 
application would get the person’s current location and assess whether she is close to a high-risk location. In 
a long-term study of an intervention for multiple health behaviors, the decision points might be weekly or 
monthly at which times, decisions are made regarding whether to change the focus from one behavior (e.g., 
physical activity) to another (e.g., diet). Finally, in many studies there is an option for an individual to press a 
''panic”button, indicating the need for help; for such interventions, decision times correspond to times at which 
the panic button might be pressed. 

A micro-randomized trial is a trial in which at each decision time t, participants are randomized to a 
treatment option, denoted by At- Treatment options may correspond to whether or not a treatment is provided 
at a decision time; for example in HeartSteps, whether or not the individual is provided a lock-screen activity 
suggestion. Or treatment options may be alternative types of treatment that can be provided at the same decision 
time; for example, a daily step goal treatment might have two options, a fixed 10,000-steps-a-day goal or an 
adaptive goal based on the user’s activity level on the previous day. Considerations of treatment burden often 
imply that the randomization will not be uniform. For example in HeartSteps, P[At = 1] = .4 so that, if an 
individual is always available, on average 2 lock-screen activity messages are delivered per day. 
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In designing, that is, determining the sample size for, a micro-randomized trial we focus on the reduced 
longitudinal data 


{So, h,Au Y 2 ,h, ^ 2 , F 3 ,..., It, At, Ff+i,..., It, Aj, Fr+i}. 

The variable. It is an “availability”indicator. The availability indicator is coded as /( = 1 if the individual is 
available for treatment and /t = 0 otherwise. At some decision times feasibility, ethics or burden considerations 
mean that the individual is unavailable for treatment and thus At should not be delivered. Consider again 
HeartSteps: if sensors indicate that the individual is likely driving a car or the individual is currently walking, 
then the lock-screen activity message should not occur. Other examples of when individuals are unavailable for 
treatment include: in the alcohol recovery setting, an “warning”treatment would only be potentially provided 
when sensors indicate that the individual is within 10 feet of a high risk location or a treatment might only be 
provided if the individual reports a high level of craving. If the application has a panic button, then only in an 
X second interval in which the panic button is pressed is it appropriate to provide “panic button’Treatments. 
Individuals may be unavailable for treatment by choice. For example, the HeartSteps application permits the 
individual to turn off the lock-screen activity messages; this option is considered critical to maintaining partici¬ 
pant buy-in and engagement with HeartSteps. After viewing the lock-screen activity message, the individual 
has the option of turning off the lock-screen message for 4 or 8 or 12 hours. After the specified time interval, 
the lock-screen message automatically turns on again. To summarize, the availability indicator at time t is the 
indicator for the subpopulation at time t among which we are interested in assessing the proximal main effect of 
the treatment; we are uninterested in assessing the proximal main effect of a treatment among individuals for 
whom it is unethical to provide treatment or for whom it makes no scientific sense to provide treatment or among 
those who refuse to he provided a treatment. 


3 Proximal Main Effect of a Treatment 

As discussed above, treatments in mobile health interventions are often designed so as to have a proximal 
effect (e.g., increase activity in near future, help an individual manage current cravings for drugs or food, take 
medications on schedule, etc.). As a result, a first question in developing a mobile health intervention is whether 
the treatments have a proximal effect. Here we develop sample size formulae that guarantee a stated power to 
detect the proximal effect of a treatment. In particular we aim to test if the proximal main effect is zero. 

To define the proximal main effect of a treatment, we use potential outcomes |22] . Our use of 

potential outcome notation is slightly more complicated than usual because treatment can only be provided 
when an individual is available. As a result, we index the potential outcomes by decision rules that incorporate 
availability. In particular define d[a, i) for a e {0,1}, i e (0,1} by d[a, 0) = “unavailable-do nothing”and d[a, 1) = a. 
Then for each aiE s^i- {0,1}, define Diffi) = d[ai,Ii). Then we denote the potential proximal responses 
following decision time 1 by and denote the potential availability indicators at decision time 2 

by Next for each dz - [ai, az) with ai,azE jO, 1}, define Dzidz) - dlaz,!^^'"'^'^). Define Dzidz) - 

(Di[ai),Dz{d 2 )). A potential proximal response following decision time 2 and corresponding to dz is 
and a potential availability indicator at decision time 3 is Similarly, for each dt - [ai,...,at) e s/Yt- 

{(«!,. ..,at)\ai E {0,1}, i = 1,..., f}, define Dfidt) - d{dt, Dt(dt) - (Di(ai),.. For each 

dt - (fli, ..., af) E .si/f, the potential proximal response is (following decision time f - 1) and potential 

availability indicator is decision time t. 

We define the proximal main effect of a treatment at time t among available individuals by: 

fit) ^ ^ ^ 

where the expectation is taken with respect to the distribution of the potential outcomes and randomization in 
At-i- This proximal effect is conditional in that the effect of treatment at time t is defined for only individuals 

available for treatment at time t, that is, _ p ppjg proximal effect is a main effect in that the effect is 

marginal over any effects of Af_i. The former conditional aspect of the definition is related to the concept of 
viable or feasible dynamic treatment regimes in which one assesses only the causal effect of treatments 

that can actually be provided. 

Consider the proximal main effect, j3[t), as t varies across time. j3[t] may vary across time for a variety of 
reasons. To see this consider the case of HeartSteps. Here fft) might initially increase with increasing t as 
participants learn and practice the activities suggested on the lock-screen. For larger t one might expect to see 
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decreasing or flat p[t) due to habituation (participants begin to, at least partially, ignore the messages). This 
time variation in p{t) can be attributed to both the immediate effect of a lock-screen activity message as well as 
interactions between the past lock-screen activity messages and the present activity message; the time variation 
occurs at least partially due to the marginal character of Pit). Alternately the conditional definition of p{t) 
means that the effect is only defined among the population of individuals who are available at decision time t. 
Changes in this population may cause changes in p{t) across time. Again consider HeartSteps. At earlier time 
points, participants are highly engaged, yet have not developed habits that in various ways increase their activity, 
thus most participants will be available. However as time progresses, some participants may develop sufficiently 
positive activity habits or anticipate activity suggestions, thus at later decision times these participants may 
be already active and thus unavailable to receive a suggestion. Other participants may become increasing 
disengaged and repeatedly turn off the lock-screen activity messages; these participants are also unavailable. 
Thus as time progresses, /1(f) may vary due to the subpopulation of participants among whom it is appropriate 
to assess the effect of the lock-screen activity message. 

Our main objective in determining the sample size will be to assure sufficient power to detect alternatives to 
the null hypothesis of no proximal main effect, Hq : ;6(f) = 0, f = 1,... T for a trial with T decision points (if Pit] is 
nonzero then for the population available at decision time f, there is a proximal effect). The proposed test will 
be focused on detecting smooth, i.e., continuous in f, alternatives to this nuU hypothesis. 

To express /1(f) in terms of the observed data distribution, we assume consistency |22]|22. This assumption 
is that for each f, the observed Yt and observed It equal the corresponding potential outcomes, 

^jjgj^ever At-i = df_i. This assumption may be violated if some of the treatments promote social 
linkages between participants, for example, to enhance social/emotional support or to compete in mobile 
games. In these cases it would be more appropriate to additionally index each individual’s potential outcomes 
by other participants’ treatments. The micro-randomization plus the consistency assumption implies that the 
proximal main effect of treatment at time f among available individuals. Pit] can be written as. 


Pit]^ E[Y^^‘^^‘- 

1,1)1 




1,1)1 



^E[Yt^i\It-- 

= l,Af=l]-£[Ff+i|/f=l,Ai = 0] 



where the second equality follows from the randomization of the Af’s and the last equality follows from the 
consistency assumption. 


4 Test Statistic 

Our sample size formula is based on a test statistic for use in testing Ho:/l(f) = 0, t-\,...T against a scientifically 
plausible alternative. This alternative should be formed based on conversations with domain experts. Here we 
construct a test statistic to detect alternatives that are, at least approximately, linear in a vector parameter, p, that 
is, alternatives of the form Z[p, where the pxl vector, Zf, is a function of f and covariates that are unaffected by 
treatment such as time of day or day of week. In the case of HeartSteps, a plausible alternative is quadratic: 

z'tP^{i,[^-^\,a-^\f)p ( 1 ) 

where p - iPi,P 2 ,P'i)' ip - 3). Recall that in HeartSteps there are 5 decision times per day; L^J translates 
decision times t to days. This rather simplistic parametrization marginalizes across the day and treats the 
weekends and weekdays similarly. 

We propose to use the alternate. Hi: /1(f) = Z^/l, f = f,...,T to construct the test statistic. We base the test 
statistic on the estimator of /i in a least squares fit of a working model. A simple working model based on the 
alternative is: 


ElYt+iUt = 1, Af] = B\a^iAt-pt]Z\p (2) 

over all f £ {1,..., T], where pt is the known randomization probability iP[At = 1] = Pt) and the qxf vector Bt is 
a function of f and covariates that are unaffected by treatment such as time of day or day of week. Note that At 
is centered by subtracting off the randomization probability; thus the working model for ait] - E[Yt+i\It - 1] is 
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(3) 


B[a. The estimators a, P minimize the least squares error: 


Pn 


E - B[a - (At - Pt)Z[pf I 


where Pw{/(X)} is defined as the average of f{X) over the sample. 

Note that from a technical perspective, minimizing the least squares criterion, j^, is reminiscent of a 
GEE analysis (T^ with identity link function and a working correlation matrrx equal to the identity. Thus it is 
natural to consider a non-identity working correlation matrix as is common in GEE. This, however, is problem¬ 
atic from a causal inference perspective. To see this suppose that the true conditional expectation is in fact 
E = 1, Ad = B[a* + {At - pt)Z'tP*, that is, the causal parameter, /1(f) is equal to Z[p*. Further suppose 

that the working correlation matrix has off-diagonal elements and that we estimate p* hy minimizing the 
weighted (hy the inverse of the working correlation matrix) least squares criterion. In this case the resulting 
estimating equations include sums of terms such as It [Yt+i - B^a - {At - ppZ'tP] Is{As - ppZg for t> s. Unfor¬ 
tunately, both availability at time f. It, as well as Yt+\ may be affected by treatment in the past (in particular, A^), 
thus absent strong assumptions E [lt{Yt+i - B\a* - {At - pt)Z\p*] /^(As - pP] is unlikely to be 0. Recall that a 
minimal condition for consistency of estimators of (a *, /I *) is that the estimating equations have expectation 
0, thus absent further assumptions, the estimators derived from the weighted least squares criterion are likely 
biased. Another possibility is to include a time-varying variance term in the least squares criterion, that is the 
fth entry in might be weighted by a This would be useful in the data analysis, however for sample size 
calculations, values of these variances are unlikely to be available. Thus for simplicity we use the unweighted 
least squares criterion in j^. 

Assume that the matrices Q-'E.J^tElIt]pt{l-pt)ZtZ[ and'Z.^^yE[It]BtB[ are invertible. The least squares 
estimators, d, p are consistent estimators of 


r j 

ZEUtWt 

U=i 


T 

LEUt 

t=l 


a{t)Bt 


(4) 


and 




Y^EliPptil-pPZtZ't 

U=i 


Y^EUtlptil-pPPWZt 

t=i 


(5) 


respectively. Furthermore if /1(f) is in fact equal to Z[p for some p, then Z[p - p{t). This is the case even if 
E[Yt+i\It - 1] ^ B[a. In the appendix (Lemma[^, we prove these results and also show that, under moment 
conditions, '/N{p - P) is asymptotically normal with mean 0 and variance l.p - Q“^14/Q“^ where. 


W^E 


(E eMAt - pPZt] X ( E eMAt - pPZ't) 
U=i ' U=i ' 


and Et = Yf+i - IfB^a - {At - pPItZ'fp. To test the null hypothesis Hq : /1(f) = 0, f = 1,..., T, one can use a test 
statistic based on the alternative, e.g. 

Np't-^'^P ( 6 ) 

where ^ and Q and W are plug in estimators. Note that this test statistic results from a GEE analysis 

with identity link function and a working correlation matrix equal to the identity matrix for which sample size 
formulae have been developed (27]. We build on this work as follows. As Tu et.al (^ discuss, under the null 
hypothesis the large sample distribution of this statistic is a chi-squared with p degrees of freedom distribution. 
If N, the sample size, is small, then, as recommended in (TT), we make small adjustments to improve the small 
sample approximation to the distribution of the test statistic. In particular Mancl and DeRouen recommend 
adjusting W using the “hat” matrix; see the formulae for the adjusted W as well as Q in Appendix^ Also in 
small sample settings, investigators commonly suggest that instead of using a critical value based on the chi- 
squared distribution, a critical value based on the f-distribution should be used (15|. As we are considering a 
simultaneous test for multiple parameters we form the critical value based on FloteUing’s T-squared distribution 
fTOl . Hotelling’s T-squared distribution is a multiple of the F distribution given by here we 

use d\ = p and d 2 - N- q- p (recall q is the number of parameters in the nuisance parameter vector, a); see the 
appendix for a rationale. In the following, the rejection region for the test of Hq : /3(f) -0, t- based on 

is 


Np’t-p^p>F-y^_^ 


{N-q-p){l-ao) \\ 
p{N-q-l) J) 


where ao is the desired significance level. 
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5 Sample Size Formulae 

As Tu et.al (27] have developed general sample size formulas in the GEE setting, here we focus on considerations 
specific to the setting of micro-randomized trials. To size the study, we will determine the sample size needed to 
detect the alternate. Pit) with: 


Hi:P[t)/a^d[t),t^l,...,T 


where - (l/r)X^=i E [Var(Yf+i|/f = average variance and d(f) is a standardized treatment effect. 

When N is large and Hi holds, Np 2^ /3 is approximately distributed as a noncentral chi-squared Xp^^N), where 

Cn, the non-centrality parameter, satisfies Cn - Nidd) iT^iad), andd = Ellpptil - pt)ZtZ[] ^ Ellpptil- 
Pt)d[t]Zt (27). Note that d - pid. 

Working Assumptions. To derive the sample size formula, we use the form of the non-centrality parameter 
of the limiting non-central chi-squared distribution, along with working assumptions. The working assumptions 
are used to simplify the form of . In particular, we make the following working assumptions: 

(a) EiYt+i\It = 1) = B[a, for some a e 

(b) pit) - Z'lP for some pe UP 

(c) Var(Ft+i |7( = 1, Af) is constant in t and At 

(d) E[etes\lt ^fJs- 1, Af, Aj] is constant in Af, A^. 

where, as before, it - Yt+\ - ItB[d - (At - pt)ItZ[p. See the proof in appendix|^(Lemma[^. The above working 
assumptions are somewhat simplistic but as will be seen below the resulting sample size formula is robust to 
moderate violations. First, under these working assumptions the alternative hypothesis can be re-written as 

Yfi-.pid^d, (7) 


where disap dimensional vector of standardized effects. Furthermore, 'Lp is given by 


and thus cjv is given by 


- T .-1 

= ^£[7t]pt(l-pt)ZtZ; , 

T 

CN = Nd'i £ £[7t]pt(l - Pt)ZtZ't)d. 
A=1 ' 


( 8 ) 


To improve the small sample approximation, we use the multiple of the F-distribution as discussed above. Thus 
the sample size, N, is found by solving 


piN-q-l)^ (iN-q-p)il-ao)]] 

N-q-p ^P.N-q-p-,CM[Ep,N-q-p[ p^N - q - 1) jj 


f-po 


(9) 


where Fp^M-q-p;cM is the noncentral F distribution with noncentrality parameter, cjv and 1 - ;6o is the desired 
power. The inputs to this sample size formula are a scientifically meaningful value for d (see below for 

an illustration), the time-varying availability pattern, {E[It]}J^y, the desired significance level, aq and power, 

1 “ Po- 

Now we describe how the information needed in the sample size formula might be obtained when the 
alternative is quadratic (p = 3, (^) . In this case we first elicit the initial standardized proximal main effect given by 
Z[pid - pild. Second we elicit the averaged across time, standardized proximal main effect d- YT.]=\Z'tPld. 
Lastly we elicit the time at which the proximal main effect is maximal, i.e. argmaXf Z[p. These three quantities 
can then be used to solve for d - idi,d2, d^)'. For example, in HeartSteps, we might want to determine the 
sample size to ensure 80% power when there is no initial treatment effect on the first day, and the maximum 
proximal main effect comes around day 29. We specify the expected availability, 7?[7f] to be constant in t and Zt 
is given by {^. Tablejjgives sample sizes for HeartSteps under a variety of average standardized proximal main 
effects id). 
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Table I: Illustrative sample sizes for Heart- 
Steps. The day of maximal treatment effect 
is 29. The expected availability is constant 
in t. 


d 

0.7 

0.6 

0.5 

0.4 

0.10 

32 

36 

42 

52 

0.09 

38 

44 

51 

63 

0.08 

47 

54 

64 

78 

0.07 

60 

69 

81 

101 

0.06 

79 

92 

109 

135 

0.05 

112 

130 

155 

193 

d = {\IT)Z]=i 

Z[d is 

the 

average 

Stan 


dardized treatment effect. 

In the behavioral sciences a standardized effect size of 0.2 is considered small (7). Thus given the very small 
standardized effect sizes, the sample sizes given in Table seem unbelievably small. Two points are worth 
making in this regard. First the use of the alternative parametric hypothesis 0 in forming the test statistic, 
implies that both between-subject as well as within-subject contrasts in proximal responses are used to detect 
the alternative. To see this, note that if we focused on only the first time point, t-\, and tested Hq : p{l) = 0, then 
an appropriate test would be a two-sample f-test based on the proximal response Y 2 , in which case the required 
sample size would be much larger (akin to the sample size for a two arm randomized-controlled trial in which 
40% of the subjects are randomized to the treatment arm). This two-sample f-test uses only between-subject 
contrasts in proximal response to test the hypothesis. The required sample size would be even larger for a test of 
Ho : /i(l) = 0, j8(2) = 0 in which no relationship between /i(l) and )S(2) is assumed. Conversely the sample size 
would be smaller if one focused on detecting alternatives to Hq : /3(1) = 0, ;S(2) = 0 of the form Hi: j3[l) = j3{2) ^ 0. 
The use of the alternative, /i(l) = p{2) ^ 0, allows one to construct tests that use both between-subject as well 
as within-subject contrasts in proximal responses. Our approach is in between these two extremes in that we 
focus on detecting smooth, in t, alternatives to Hq : PH) - 0 for all t. This permits use of both within- as well as 
between-subject contrasts in proximal responses. The assumption of a parsimonious alternative enables the use 
of smaller sample sizes. A second point is that, at this time, there is no general understanding of how large the 
standardized effect size should be for these "in-the-moment" effects of a treatment. Thus these standardized 
effects may or may not be considered small in future. 


6 Simulations 

We consider a variety of simulations with different generative models to evaluate the performance of the sample 
size formulae. In the simulations presented here, we use the same setup as in HeartSteps; see Appendix|^for 
simulations in other setups (Tahle [4B) . Specifically, the duration of the study is 42 days and there are 5 decision 
times within each day [T - 210). The randomization probability is 0.4 , e.g. p-pt- P{At = 1) = 0.4. The sample 
size formula is given in and j^. All simulations are based on 1,000 simulated data sets. 

Throughout this section the inputs to this sample size formula are Zf = (l, L^J , L^J . the time-varying 
availability pattern, Tf = E[It], d, ao = -05 and power, l-po = .80. The value for the vector d is indirectly specified 
via (a) the time at which the maximal standardized proximal main effect is achieved (argmaX(Z'd), (b) the 
averaged across time, standardized proximal main effect d- ^ and (c) no initial standardized proximal 

main effect (Zj d-di-0). The test statistic used to evaluate the sample size formula is given by in which Bt 
and Zf are set to (l, L^J. 

The simulation results provided below illustrate that the sample size formula and associated test statistic are 
robust. For convenience we summarize the results here. When the working assumptions hold, then under a 
variety of availability patterns, i.e., time-varying values for Tf = Elip (see Figure[^ the desired Type 1 error and 
power are preserved. This is also the case when past treatment impacts availability. Furthermore the sample 
size formula is robust to deviations from the working assumptions, that is, provides the desired Type 1 error 
and power; this is true for a variety of forms of the true proximal main effect of the treatment (see Figure]^, a 
variety of distributions and correlation patterns for the errors, and dependence of Ff+i on past treatment. In all 
cases the above robustness occurs as long as we provide an approximately true or conservative value for the 
standardized effect, d and if we provide an approximately true or conservative (low) value for the availability, 
E[I,]. 
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In our simulations, we note several areas in which the sample size formula is less robust to the working 
assumption (c); this is when the error variance in Yt+i varies depending on whether treatment = 1 or = 0 
orwithtime t. In particular if the ratio ofVar[F(+i|7f = \,At- l]/Var[Yf+i|7f = l,At = 0] < 1, then the power is 
reduced. Also if average variance, 7 j[ Var[Ff+i|7f = 1, Ad] varies greatly with time t, then the power is reduced. 
See below for details. Lastly as would be expected for any sample size formula, using values of the standardized 
effect size, d, or availability that are larger than the truth degrades the power of the procedure. 

6.1 Working Assumptions Underlying Sample Size Formula are True 

First, we considered a variety of settings in which the working assumptions (a) - (d) hold and in which the inputs to 
the sample size formula are correct [d is correct under the alternate hypothesis and the time-varying availability 
E[It] is correct). Neither the working assumptions nor the inputs to the sample size formula specify the error 
distribution, thus in the simulation we consider 5 distributions for the errors in the model for Yt+\ including 
independent normal, student’s t and exponential distributions as well as two autoregressive (AR) processes; 
all of these error patterns satisfy \ (recall d^ - (l/r)X^=i [Var(F(+i |7t = 1, Af)]). Furthermore neither 
the working assumptions nor the inputs to the sample size formula specify the dependence of the availability 
indicator. It on past treatment. Thus we consider settings in which the availability decreases as the number of 
recent treatments increases. For brevity, we provide these standard results in the AppendfxjB] (Tables|2B|and|3B) . 
The results are generally quite good, with very few Type 1 error rates significantly above .05 and power levels 
significantly below .80. 


0.60 

0.55 

>. 

;|o.5o 

ro 

0.45 

0.40 

Figure 1: Availability Patterns. The x-axis is decision time point and y-axis is the expected availability. Pattern 2 
represents availability varying by day of the week with higher availability on the weekends and lower mid-week. 
The average availability is 0.5 in all cases. 
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0 50 100 150 200 0 50 100 150 200 0 50 100 150 200 0 50 100 150 200 

Time 


6.2 Working Assumptions Underlying Sample Size Formula are False 

Second, we considered a variety of settings in which the working assumptions are false but the inputs to the 
sample size formula are approximately correct as follows. Throughout d^ -\. 


6.2.1 Working Assumption (a) is Violated. 

Suppose that the true 7;[Ff+i|7f = 1] ^ Bfa for any a e IIS'?. In particular, we consider the scenario in which there 
is a "weekend" effect on Yt+i; see other scenario in Appendix]^ The data is generated as follows. 


Ber 


[Tt], A/i^(p) 


Yt+i - a{t) + [At - p)Z'td + Ct, if 7t = 1 


where the conditional mean a[t) - B[a + Wtd. Wt is a binary variable: = 1 if day of the week is time f is a 

weekend day, and Wf = 0 if the day is a weekday. For simplicity, we assume each subject starts on Monday, e.g. 
for k- 1,..., 6, W,+ 35 (j;_i) = 0, when / = 1,..., 25, W/+ 35 (j;_i) = 1, when 1 = 26,..., 35 (recall that we assume in the 
simulation that there are 5 decision time points per day and the length of the study is 6 week). The values of 
{a,-, i = 1,2,3} are determined by setting a(l) = 2.5,argmaXf o:(t) = T, [I I a{t) - a{l) = O.l. The error terms 

{£t}f=i are i.i.d N(0,1). The day of maximal proximal effect is 29. Additionally, different values of the averaged 
standardized treatment effect and four patterns of availability as shown in Figure[^with average 0.5 and are 
considered. The type I error rate is not affected, thus is omitted here. The simulated power is reported in Table 
[n| for more details see Table|6B|in Appendrx]^ 
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Table II: Simulated power when working assump¬ 
tion (a) is violated. The patterns of availability are 
provided in Figure|^ _ 




Availability Pattern 

G 

d 

Pattern 1 

Pattern 2 

Pattern 3 

0.5d 

0.10 

0.80 

0.79 

0.81 

0.06 

0.78 

0.83 

0.81 

Id 

0.10 

0.79 

0.78 

0.78 

0.06 

0.78 

0.79 

0.79 

l.5d 

0.10 

0.78 

0.81 

0.78 

0.06 

0.77 

0.81 

0.82 

2d 

0.10 

0.78 

0.79 

0.79 

0.06 

0.81 

0.79 

0.78 


6 is the coefficient of Wt in = 1]. d - 

{\IT)YJt=i is the average standardized treat¬ 
ment effect. Bold Numbers are significantly (at .05 
level) greater than .05. 


6.2.2 Working Assumption (b) is Violated. 

Suppose that the true I3(t) ^ for any p. Instead the vector of standardized effect, d, used in the sample 
size formula corresponds to the projection of d{t], that is, d - (Z^=i E[It]ZtZ’i) ^ T.J^^ElItjZtdit) (recall d{t)- 
p{t) I a and p t = p). The sample size formula is used with the correct availability pattern, {E[It]]^^^. The data for 
each simulated subject is generated sequentially as follows. For each time t, 




Ber , 


Yt+i = a(f) -t {At - p)d{t) + £t, if /f = 1 


for the variety of d{t) - Pit) Id and Elip patterns provided in Figure|^and in Figure[2respectively. The average 
availability is 0.5. The error terms are generated as i.i.d. N{Q,1). The conditional mean, i?[Ft+i|7( = 

1] = a[t) is given by a[t) - ai + 0 : 2 L^J + where ai - 2.5, az - Q.127,a^ - -8.66 x 10“^ (so that 

(l/r)Xf cr(7) “ cr(l) = 1> argmaXf a(f) = T). 


Table III: Simulated Power when working assumption (b) is violated. The shape 
of the standardized proximal effect and pattern for availability are provided in 
Figure[^and|^respectively. The sample sizes are given on the right. _ 





Shape of d (t) 



d 

Availability Pattern 

Max 

Maintained 

Degraded 

Sample Size 


Pattern 1 

15 

0.78 

0.79 

43 

39 


29 

0.80 

0.79 

38 

38 

0.10 

Pattern 2 

15 

0.79 

0.80 

43 

39 

29 

0.78 

0.79 

38 

38 


Pattern 3 

15 

0.81 

0.77 

45 

41 


29 

0.81 

0.78 

37 

39 


Pattern 1 

15 

0.81 

0.79 

111 

100 


29 

0.81 

0.79 

96 

96 

0.06 

Pattern 2 

15 

0.79 

0.81 

112 

100 

29 

0.79 

0.80 

96 

96 


Pattern 3 

15 

0.78 

0.81 

116 

106 


29 

0.80 

0.80 

95 

101 


d- (1/ T) Z[d is the average standardized treatment effect. The "Max" in 
the first row refers to the day of maximal proximal effect. Bold Numbers are 
significantly (at .05 level) lower than .80. 
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Figure 2: Proximal Main Effects of Treatment, representing maintained and severely degraded time- 

varying proximal treatment effects. The horizontal axis is the decision time point. The vertical axis is the 
standardized treatment effect. The "Max" in the titles refer to the day of maximal proximal effect. The average 
standardized proximal effect is d = 0.1 in aU plots. 
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The simulated powers are provided in Table]^ In all cases the power is close to .80; this is because all of 
the proximal main effect patterns in Figure]^ are sufficiently well approximated by a quadratic in time. See 
Appendrxj^for other cases of d{t] and details (Figure|^and Table|9B). 


6.2.3 Working Assumption (c) is Violated. 

Suppose that Var[F(+i|7f = 1, Af] = -t (1 - where crif/crof ^ 1- The sample size formula is used with 

the correct pattern for {Z'd, The data for each simulated subject is generated sequentially as follows. 

For each time t, 


r Ber ( \ ^ 

h ~ (Tf), 21 


Ber ( \ 

t ~ (pj 


Yt+\ = a(t) -t (At - p)Z'td + l{A,=i)0'itef -t l{A,=0}O-0fet. if 7t = 1 

where the average across time standardized proximal main effect, d - y Z?Li is 0.1 and day of maximal 
effect is equal to 22 or 29. The function a(t)- E[Yt+i\It - 1] is as in the prior simulation. The availability, Tf = 0.5. 
The error terms {ed follow a normal AR(1) process, e.g. Cf = (pct-i + Vt with the variance of Vt scaled so that 
Varied = 1. Dehne dj = £■[ Var[Yf+i|7f = 1, Af]] (= pcf^t + (1 ^ Pi^of)- Recall the average variance is given by 
(l/r)Xf=i d(. We considers time-varying trends for {dd together with different values of axtlcrot', see Figure 
j^. In each trend, df is scaled such that d = 1; thus the standardized proximal main effect in the generative 
model is Z[d. In all cases, the simulated type I error rates are close to .05 and thus the table is omitted here (see 
Appendix]^ Table lOB . The simulated power is given in Table[Tv| 


Table IV: Simulated Power when working assumption (c) is violated, (Jit^ 
(jQf The trends are provided in Figure]^ The availability is 0.5. The average 
proximal main effect, d = 0.1 and the day of maximal effect is 22 or 29, and 
thus the associated sample sizes are 41 and 42. 


0 

got 

Max = 22 (N = 
trend 1 trend 2 

41) 

trend 3 

Max = 29 (N = 
trend 1 trend 2 

42) 

trend 3 


0.8 

0.83 

0.84 

0.80 

0.81 

0.89 

0.79 

-0.6 

1.0 

0.79 

0.80 

0.75 

0.74 

0.85 

0.70 


1.2 

0.76 

0.76 

0.71 

0.72 

0.81 

0.70 


0.8 

0.85 

0.82 

0.79 

0.81 

0.88 

0.78 

0 

1.0 

0.79 

0.81 

0.74 

0.77 

0.86 

0.72 


1.2 

0.77 

0.77 

0.71 

0.70 

0.83 

0.70 


0.8 

0.83 

0.83 

0.81 

0.77 

0.87 

0.77 

0.6 

1.0 

0.76 

0.79 

0.75 

0.73 

0.85 

0.77 


1.2 

0.78 

0.77 

0.73 

0.72 

0.82 

0.69 


(/) is the parameter in AR(1) for {edf=i- “Max’Tsthe day in which the maxi¬ 
mal proximal effect is attained. Bold numbers are significantly (at .05 level) 
lower than .80. 
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Trend 1 


Trend 2 


Trend 3 


Figure 3: Trend of af. For all trends, is scaled so that = 1. In Trend 3, the variance, cr^ = 

E[Var[Yt+\\It - peaks on weekends. In particular, (Tyfc+j = 0.8 for i = 1, ...,5 and = 1.5 for i - 6,7. 


In the case of crif < (Too the simulated powers are slightly larger than 0.8, while the simulated powers are 
smaller than 0.8 in the case of (Tk > aof The impact of at on the power depends on the shape of treatment 
effect: when j3{t) attains its maximum, more than halfway through the study, at day 29, a increasing {at}, trend 
1, lowers the power, while a decreasing [at], trend 2, improves the power. When p{t) attains a maximal effect 
midway through the study, either decreasing or increasing [at) does not impact power. A large variation indt, 
e.g. trend 3, reduces the power in all cases. The differing auto correlations of the errors, Ct, do not affect power; 
see a more detailed table in Appendix]^ Table |10B| 


6.2.4 Working Assumption (d) is Violated 


We violate assumption (d) by making both the availability indicator. It and proximal response, Yt+i depend 
on past treatment and past proximal responses. The sample size formula is used with the correct value of 
{Ztd,E[It]]J^^; in particular d is determined by an average proximal main effect ofd- 0.1, day of maximal effect 
equal to 29 {di -Q,d 2 - 9.64 x 10“^, ds = -1.72 x 10“^) and with a constant availability pattern equal to 0.5. The 
data for each simulated subject is generated as follows. Denote the cumulative treatment over last 24 hours by 
Ct-Y.^: , Af_In each time t, 

= l i J i j ^5 

h + Tf77i (Cf - E[Ct]) + Tf772Trunc(- ^ At (p) 

^J=i 


^ a(t) + 71 [Ct - E[Ct\It = 1]] + {At - p) [z;d + z;d72(Cf - E[Ct\It = 1])] + C 7 *et if /f = 1 

|ao(f) + etif^f = 0. 

where are i.i.d W(0,1) and Trunc(x) := xl\x\<i + sign(x)I[|;c|>i (the truncation is used to ensure that Tf + 

T(Pi(Ct - + Tf772Trunc(i Zy^i ft-;) f [0,1]). Again a{t) is as in the prior simulation, a* is calculated such 

that the average variance is equal to 1, e.g. a - 7 X ^=1 F[Var[Ft+i |/f = 1, Af]] = 1. Note that since Ct is centered 
in both the model for It as well as in the model for Yt+i, the standardized proximal main effect is Z'fd and 
EUt] -tt- 0-5. 0 : 0 ( 1 ) is the conditional mean of Ff+i when It - 0. The form of E[Yt+i\It - 0] is not essential: 
only Ys+i - iijYs+il/i = 0] is used to generate It- In the simulation, E[Ct\It = 1] and it* are calculated by Monte 
Carlo methods. As before, the simulated type I error are not affected; see Table llB| in appendix]^ The simulated 
powers are provided in Table|^ 


Table V: Simulated Power when working assumption 
(d) is false. The expected availability is 0.5, the average 
proximal main effect d = 0.1 and the maximal effect is 
attained at day 29. The associated sample size is 42. 


Parameters in It 

72 

71 

-0.1 

-0.2 

-0.3 


-0.2 

0.80 

0.81 

0.79 

771 = - 0 . 1,772 = -0.1 

-0.5 

0.79 

0.81 

0.80 


-0.8 

0.81 

0.82 

0.79 


-0.2 

0.78 

0.82 

0.79 

771 = - 0 . 2,772 = -0.1 

-0.5 

0.81 

0.77 

0.77 


-0.8 

0.81 

0.79 

0.78 


-0.2 

0.78 

0.78 

0.80 

771 = - 0 . 1,772 = -0.2 

-0.5 

0.80 

0.79 

0.78 


-0.8 

0.78 

0.79 

0.80 


71 , 72 are parameters for the cumulative treatments in 
model of Yf+i; 771,772 are parameters in model of It. Bold 
numbers are significantly(at .05 level)less than .80. 
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6.3 Some Practical Guidelines 


Third, it is critical to use conservative values of d and availability E[It] in the sample size formula. It is not 
surprising that the quality of the sample size formula depends on an accurate or conservative values of the 
standardized effects, d, as this is the case for all sample size formulas. Additionally availability provides the 
number of decision points as which treatment might be provided per individual and thus the sample size 
formula should be sensitive to availability. To illustrate these points we consider a simulation in which the data 
is generated by 


Ber 


[rt], 


Yt+i = a{t) + [At - p]Z[d + Cf, if /t = 1 


where the Cf’s are i.i.d. standard normals and Q:(f) is as in the prior simulations. First suppose the scientist 
provides the correct availability pattern, the correct time at which the maximal standardized proximal 

main effect is achieved (argmax^ Z[d) and the correct initial standardized proximal main effect {Z[d = di = 0) 
but provides too low a value of the averaged across time, standardized proximal main effect d- Yll^^iZ'^d. The 
simulated power is provided in Appendix]^ Table 12B The degradation in power is pronounced as might be 
expected. 

Second, suppose the scientist provides the correct argmaxt Z^d, correct Z[d - di - 0, correct d- y Z^=i Z^d 
and although the scientist’s time-varying pattern of availability is correct, the magnitude is underestimated. The 
simulation result is in Appendix]^ Table 13B Again the degradation in power is pronounced. 


7 Discussion 

In this paper, we have introduced the use of micro-randomized trials in mobile health and have provided an 
approach to determining the sample size. More sophisticated sample size procedures might be entertained. 
Certainly it makes sense to include baseline information in the sample size procedure, for example in HeartSteps, 
a natural baseline variable is baseline step count. The inclusion of baseline variables in Bt in the regression f2) is 
straightforward. An interesting generalization to the sample size procedure would allow scientists to include 
time-varying variables (in St) as covariates in Bt in the regression (^. This might be a useful strategy for reducing 
the error variance. 

Although this paper has focused on determining the sample size to detect the proximal main effect of a 
treatment with a given power, micro-randomized studies provide data for a variety of interesting further analyses. 
For example, it is of some interest to model and understand the predictors of the time-varying availability 
indicator. In the case of HeartSteps we will know why the participant is unavailable (driving a car, already active 
or has turned off the lock-screen messages) so we will be able to consider each type of availability indicator. 
Other very interesting further analyses include assessing interactions between treatments. At and context, St, 
past treatment Aj, 5 < ton the proximal response, Yt+ 1 . Also there is much interest in using this type of data to 
construct “dynamic treatment regimes”; in this setting these are called fust-in-Time Adaptive Interventions . 
The sequential micro-randomizations enhance all of these analyses by reducing causal confounding. 
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Appendix A Theoretical Results and Proofs 

Lemma 1 (Least Squares Estimator). The least square estimators a, p are consistent estimators of a, f in and 
(^. In particular, if fit) = Z'^-p* for some vector p*, then P-P*. Under moment conditions, we have \fN{p - p) 
7V(0,2jg), where the asymptotic variance l.p is given by 2^ = where Q = JlJ^iElIpptU - Pt)ZtZ[, 

W^e\C thlAt - pt)Zt X - pt)Z[\ andct = Ym - B[a - Z[p{At -pt). 


Proof. It's easy to see that the least square estimators satisfy 

T 


I I 

e = {&,P) = (pjv y itXtX[) (PwE itYt+iXt) 

^ t=l ^ ^ t=l 

T -1 T 

-^[j^EUtXtX'p] [j^EUtY^Xp] 


2 

f=l t=l 

where X[ = (B', [At - ppZ'p e is the covariate at time t. For each t, 

EUPBtB’t BtZ’EUMt - Pt)] ] _ (EUtlBtB’t 


EUtXtX't) = 
EUtYt^iXt) = 


ZtB[E[It{At-pt)] ZtZ’E[It{At-pt)^]j i 0 

[ EUtYt^PBt EUtYt+PBt ] 

[EUtYMiAt - pt)]Ztj iptd - pPEUPPifZtj ’ 


E[It]pt{l-pt)ZtZ[ 


so that 


Y^EUPBtB'A EEUtlBjB; Y. ^UPaWEt 

j=l I t=l Vf=l / f=l 


( T 




-1 


T 


v-1 


Ypt{i-pt)E[it]ZtZ't YEAtYM(At-pt)]Zt^ YpA'^-pP^iipZtZ'A YEUt]pta-Pt)Pit)Zt 
j=l / f=l u=l / ^=1 

as in H) and We can see that if/i(r) = Z’^p*, then pdl - ^ ^[fdpdl - Pf);S(t)^f = 

(iLi Pfd - Pt)ElIt]ZtZ'A~^Lj=i EUtlptil - Pt)ZtZ[p* = p*. This is true even if = 1] ^ B[a. 

We can easily see that, 


^(0 -B)^Vn\ (Pjv E ItXtX't) H(P]v E ^tYt+iXt] - (Pjv E ItXtX't)0] 

i t=l ^ t=l t=l ^ 

T T 

= ^{i?[ E ItXtX't]~\PN E ItetXt)} + Op(l), 


( 10 ) 


t=l 


t=l 


where 0^(1) is a term that converges in probability to zero as N goes to infinity. By the definition of a and fj we 
have 


t=l 


0 


[LU EUPptU - pPiPit) - Z\p]Zt] 

So that under moments conditions, we have Vn[9 -9] ^ N{0, 2e), where l.g is given hy 


20 = £[ E ItXtX'A~^E[ E ItStXt X E ItStX't]E[ f fXtX't]-': 

t=l f=l f=l f=l 

In particular, p satisfies \AN[p- P)^ N{0,1.p) and 2^3 is given hy 


-a/3 




r T A-1 r ^ ^ 1 / ^ . . 

Ip = ( E ^[-fdPfd - Pt)ZtZ't] e\ E eMAt-pPZt X E eMAt - pt)Z't\ ( E EUfptU - pPZtZ't] = Q-^ WQ"! 


T 

E 

t=l 


T 

E 

t=l 


T 

L 

f=i 


□ 


Lemma 2 (Asymptotic Variance Under Working Assumptions). Assuming working assumptions are true, 

then under the alternative hypothesis Hi in Up and cn are given by 

ip^d^iYEUtipta-ppztZ't) \ 

T 

cn = Nd'i E EUPpta - Pt)ZtZ't)d. 

S=1 ^ 
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Proof. Note that under assumptions and |^, we have = fit) and Var(Ff+i |7( = 1, = a for each t, and 

d-d. The middle term, W, in 2^ can he separated hy two terms, e.g. ~ Pt)Zt x Z^=i “ 

Pt)Z[^ - YJi=\E[e]lt{At-pt)'^]ZtZ[ + Y.J^jE[eiejIiIj{Ai-pi){Aj-pj)]ZiZ'y Under assumptions Q, Q and 
jc , we have ij[ef|7f = l,^f] = 0 and - p()^] = 7?[7t]pf(1 “ Pt)d^- Furthermore, suppose / > 7 , then 

E £iejIiIj{Ai-p){Aj-p)] ^ EUiljiAj - p){Ai- p)]x E[£t£s\It ^ lJs= = 0 , because 7l,JL{7/, 77 ,^ 7 } and 

the first term is 0. W is then given hy 


w^d^Y.E[it]pta-Pt)ZtZ't, 

t=i 


so that I.p = EUtlptil - pt)ZtZ[] ^ and cjv = N [d d) 1. {d d) = ]Vd'(z?li ^Uflpdl - Pf)ZfZ'j<7. □ 

Tiemarfc; Working assumption (d) can he replaced hy assuming 7?[y(+i|7( = \,At, f = l,^s] -7j[Ff+i|7( = l,At] 
does not depend on At for any s < t, or a markov type of assumption, Ft+iJL{Fs+i, 7 s,^ 5 , s < t}|7f, Either of 
them implies E[eiejIiIj{Ai - pi){Aj - P 7 )] = 0, so that Zp and cjv have the same simplified forms. 

Rationale for multiple of F distribution The distribution of the quadratic form, n(X - p)'£“^ (X - p) con¬ 
structed from a random sample of size n of N (p, Z) random variables in which Z is the sample covariance 
matrix follows a Hotelling’s T -squared distribution. The Hotelling’s T -squared distribution is a multiple of the F 
distribution, Edi,d 2 in which di is the dimension of p, and d 2 is the sample size. Our sample sample 

approximation replaces di by p (the number of parameters in the test statistic) and dzhy n - q - p (the sample 
size minus the number of nuisance parameters minus d\). 

Formula for adjusted W and Q Define a individual-specific residual vector e as the T x 1 vector with tth 
entry it = Yt+i - ItB[a - ItiAt - pt]Z[f. For each individual define the tth row of the T x [p+ q) individual- 
specific matrix X by ItiAt - pt)Zt). Then define H - X [P]vZ'X] ^ X'. The matrix is given by the 
lower right p x p block in the inverse of [P]vZ'X]; the matrix W is given by the lower right px p block in 
Pjv [X^(7- 77)-iee'(7- 77)-iX]. 


Appendix B Further Simulations and Details 


B. 1 Simulation Results When Working Assumptions are True 

We conduct a variety of simulations in settings in which the working assumptions hold, the scientist provides 
the correct pattern for the expected availability, T( = 7?[7f] and under the alternate, the standardized proximal 
main effect is d(f) = Z[d. Here we will mainly focus on the setup where the duration of the study is 42 days and 
there are 5 decision times within each day, but similar results can be obtained in different setups; see below. The 
randomization probability is 0.4, e.g. p-pt- P{At = 1) = 0.4. The sample size formula is given in and j^. 
The test statistic is given by in which Bt and Zt equal to (l, L-^J. ■ All simulations are based on 1,000 

simulated data sets. The significance level is 0.05 and the desired power is 80%. 

In the first simulation, the data for each simulated subject is generated sequentially as follows. For t = 
1 ,..., T = 210 , 7(, At and Yt+i are generated by 


Ber 


( t .), At^^^ip) 


Ff+i ^a[t) + {At - p)d{t] + £t, if 7t = 1 


where d{t)- Z\d and Xt are same as in the sample size model. The conditional mean, 7?[Ff+i (7^ = 1] = a{t) is 
given by a(t) = ai - 1 - o; 2 L-^J + where ai - 2.5, 0:2 = 0.727,as = - 8.66 x 10“^ (so that {\lT)Y.t^W - 

a(l) = 1, argmaXf a(f) = T). We consider 5 differing distributions for the errors {et}^^y independent normal; 
independent (scaled) Student’s t distribution with 3 degrees of freedom; independent (centered) exponential 
distribution with A = 1; a Gaussian AR(1) process, e.g. Cf = (p£t-\ + Vti where Vt is white noise with variance 
(Ty such that Var(ef) = 1; and lastly a Gaussian AR(5) process, e.g. Cf = f + Vt, where Vt is white 

noise with variance CTy such that Varied = 1. In all cases the errors are scaled to have mean 0 and variance 1 
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(i.e. ElCfllt = 1] = 0, Var[Cf/( = 1] = 1). Additionally four availability patterns, e.g. time varying values for 
Tf = E[It], are considered; see Figure (^. The simulated type 1 error rate and power when the duration of study 
is 42 days are reported in Table |2B| and |3B| T he simulation results in other setups, e.g. the length of the study is 4 
week and 8 week, are reported in Tahle |4B| The associated sample sizes are given in Tahle |lB| 

Since neither the working assumptions nor the inputs to the sample size formula specify the dependence of 
the availability indicator. It on past treatment. In the second simulation, we consider the setting in which the 
availability decreases as the number of treatments provided in the recent past increase. In particular, the data 
are generated as follows. 


It[Tt + riZ ^At-jlt-j - E[At-jIt-j])), Af (p) 

i=i 

Yt+i = a(f) + {At - p)d{t) + Cf, if 7f = 1 

Note that since we center in the generative model of It, the expected availability is Tj. The 

specihcation of a{t}, p{t] and Cf are same as in the first simulation. The simulated type I error rate and power 
are reported Table |5B| 

B.2 Further Details When Working Assumptions are False 

B.2.1 Working Assumption (a) is Violated. 

Here we consider another setting in which the working assumption (a) is violated, e.g. the underlying true 
ElYt+i\It - 1] follows a non-quadratic form (recall that Bt is given by (l, L^J> L^J^) )■ The data is generated 
as follows 




Yt+i = a{t) + {At - p]Z'td + Ct, if /t = 1 


where a{t) = i?[Ff+i|7f = 1] is provided in Figure]^ For each case, a{t] satisfies a(l) = 2.5 and (l/r)X?Li ““(1) - 
0.1. The error terms are i.i.d N(0,1). The day of maximal proximal effect is assumed to he 29. Additionally, 

different values of averaged standardized treatment effect and four patterns of availability in Figure with 


average 0.5 are considered. The simulation results are reported in Table 7B 


B.2.2 Additional Simulation Results When Other Working Assumptions are False 

The main body of the paper reports part of the results when working assumptions (b), (c) and (d) are violated. 
Additional simulation results are provided here. In particular, the simulation result is reported in Table |9B| when 
d{t) follows other non-quadratic forms, e.g. working assumption (b) is false; see Figure]^ The simulated Type 1 
error rate and power when working assumption (c) is false are reported in Tahle|10B| The simulated Type 1 error 


rate when working assumption (d) is violated is reported in Table IIB 


B.2.3 Simulation Results when d and t are misspecified. 

As discussed in the paper, the first scenario considers the setting in which the scientist provides the correct 
availability pattern, {7j[7f]}^p the correct time at which the maximal standardized proximal main effect is 
achieved (argmaXf Z[d) and the correct initial standardized proximal main effect {Z[ d- d\-G) but provides 
too low a value of the averaged across time, standardized proximal main effect d- y Z[d. The simulated 

In the second scenario, the scientist provides the correct argmaXf Z[d, correct 


12B 


power is provided in Table 

Zjd = di = 0, correct d - y Z[d and although the scientist’s time-varying pattern of availability is correct. 


the magnitude, e.g. the average availability, is underestimated. The simulation result is in Table 13B 
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Table IB: Sample Sizes when the proximal treatment effect satisfies d{t)- Z[d. The significance 
level is 0.05. The desired power is 0.80. 





f = 0.5 

t= 0.7 

Duration of Study 

Availability Pattern 

Max 

Average Proximal Effect 




0.10 

0.08 

0.06 

0.10 

0.08 

0.06 



15 

59 

89 

154 

43 

65 

112 


Pattern 1 

22 

60 

91 

158 

44 

66 

114 



29 

58 

87 

152 

43 

64 

110 



15 

59 

89 

154 

43 

65 

112 


Pattern 2 

22 

60 

92 

159 

44 

67 

115 

4-week 


29 

58 

89 

154 

43 

64 

111 


15 

59 

90 

157 

44 

66 

113 


Pattern 3 

22 

63 

96 

167 

46 

69 

119 



29 

62 

94 

163 

45 

67 

115 



15 

59 

89 

155 

43 

65 

112 


Pattern 4 

22 

57 

86 

150 

43 

64 

110 



29 

54 

82 

142 

41 

61 

105 



22 

41 

61 

105 

31 

45 

76 


Pattern 1 

29 

42 

64 

109 

32 

47 

79 



36 

41 

62 

106 

31 

45 

77 



22 

41 

61 

105 

31 

45 

76 


Pattern 2 

29 

43 

64 

110 

32 

47 

80 

6-week 


36 

42 

62 

107 

31 

46 

77 


22 

42 

62 

106 

31 

46 

77 


Pattern 3 

29 

44 

66 

114 

33 

48 

82 



36 

43 

65 

112 

32 

47 

80 



22 

41 

62 

106 

31 

45 

77 


Pattern 4 

29 

41 

62 

106 

31 

46 

78 



36 

40 

59 

101 

30 

44 

74 



29 

32 

47 

80 

25 

35 

58 


Pattern 1 

36 

33 

49 

84 

26 

37 

61 



43 

33 

48 

82 

25 

36 

60 



29 

32 

47 

80 

25 

35 

58 


Pattern 2 

36 

34 

49 

84 

26 

37 

61 

8-week 


43 

33 

49 

82 

25 

36 

60 


29 

33 

48 

82 

25 

36 

59 


Pattern 3 

36 

35 

51 

87 

26 

38 

63 



43 

34 

50 

86 

26 

37 

62 



29 

33 

48 

81 

25 

36 

59 


Pattern 4 

36 

33 

49 

83 

25 

36 

61 



43 

32 

47 

80 

25 

35 

59 


“Max’Ts the day in which the maximal proximal effect is attained, t = is the 

average availability. 
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Table 2B: Simulated Type I error rate (%) when working assumptions are true. Duration of the 
study is 6-week. The associated sample size is given in Table IB 


Error Term 

Availability Pattern 

Max 

f = 0.5 

Hi 

II 

O 

Average Proximal Effect 

0.10 

0.08 

0.06 

0.10 

0.08 

0.06 



22 

3.8 

4.5 

4.9 

4.6 

5.3 

4.8 


Pattern 1 

29 

4.7 

6.0 

4.6 

4.0 

3.2 

5.0 



36 

5.0 

5.4 

4.9 

4.3 

4.8 

4.6 



22 

4.8 

4.1 

4.8 

4.4 

3.5 

4.1 


Pattern 2 

29 

4.3 

6.2 

3.2 

4.6 

4.2 

4.2 

i.i.d. Normal 


36 

4.5 

4.8 

5.2 

4.5 

3.5 

5.4 


22 

4.7 

4.5 

6.3 

4.4 

4.9 

4.9 


Pattern 3 

29 

4.1 

5.1 

4.6 

4.3 

6.0 

5.6 



36 

4.7 

4.4 

4.6 

4.1 

5.1 

4.4 



22 

5.4 

3.5 

4.5 

4.8 

4.7 

5.0 


Pattern 4 

29 

5.2 

4.5 

4.5 

5.0 

5.0 

5.1 



36 

3.8 

4.1 

5.4 

4.7 

5.0 

5.9 



22 

4.3 

4.4 

3.2 

4.1 

4.1 

5.2 

i.i.d. t dist. 

Pattern 1 

29 

5.0 

3.8 

3.2 

3.7 

4.2 

6.3 



36 

4.3 

4.5 

4.0 

5.0 

5.7 

5.4 



22 

4.5 

4.6 

4.4 

3.7 

7.1 

3.1 

i.i.d. Exp. 

Pattern 1 

29 

4.5 

4.6 

4.2 

4.5 

4.5 

4.7 



36 

2.7 

4.8 

4.8 

3.9 

3.7 

3.4 



22 

4.3 

5.3 

4.6 

3.8 

4.2 

4.0 

ARID, 0 =-0.6 

Pattern 1 

29 

4.6 

5.4 

5.1 

4.0 

4.4 

4.3 



36 

4.7 

4.0 

4.0 

4.1 

4.2 

3.9 



22 

5.8 

3.4 

4.4 

3.3 

4.0 

5.4 

ARID, 0 =-0.3 

Pattern 1 

29 

4.9 

4.7 

4.6 

5.5 

5.5 

4.5 



36 

4.0 

4.7 

4.4 

4.9 

5.0 

4.7 



22 

4.6 

4.6 

4.9 

4.3 

5.4 

4.1 

AR(1), 0 = 0.3 

Pattern 1 

29 

4.8 

5.3 

4.1 

4.3 

4.2 

5.2 



36 

3.6 

3.9 

4.9 

4.8 

4.9 

4.9 



22 

4.4 

5.1 

4.9 

3.6 

5.2 

3.7 

AR(1), 0 = 0.6 

Pattern 1 

29 

3.7 

4.9 

4.6 

4.5 

4.3 

5.8 



36 

4.4 

6.7 

5.2 

5.6 

3.6 

5.1 



22 

4.4 

4.7 

5.1 

4.2 

4.5 

5.5 

AR(5),0 = -O.6 

Pattern 1 

29 

4.3 

5.1 

4.3 

3.2 

3.5 

4.2 



36 

5.3 

4.5 

6.1 

4.2 

4.6 

5.4 



22 

3.7 

4.4 

6.0 

5.0 

4.5 

3.5 

AR(5),0 = -O.3 

Pattern 1 

29 

4.4 

4.7 

5.2 

5.3 

4.5 

5.0 



36 

4.5 

5.0 

5.1 

4.1 

5.3 

4.8 



22 

5.3 

4.3 

5.7 

4.8 

4.1 

4.3 

AR(5), 0 = 0.3 

Pattern 1 

29 

3.9 

4.8 

4.1 

4.0 

4.3 

4.9 



36 

4.2 

5.5 

5.1 

3.6 

4.5 

3.6 



22 

5.1 

4.5 

4.0 

4.5 

3.8 

5.2 

AR(5), 0 = 0.6 

Pattern 1 

29 

5.2 

4.8 

4.5 

2.9 

5.3 

4.4 



36 

4.1 

3.6 

4.6 

3.9 

4.4 

4.9 


"Max”is the day in which the maximal proximal effect is attained, t = (1 / T) X Iff] is the aver¬ 

age availability, cp is the parameter for AR(1) and AR(5) process. Bold numbers are significantly(at 
.05 level) greater than .05. 
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Table 3B: Simulated Power(%) when working assumptions are true. Duration of the study is 6-week. 


The associated sample size is given in Table 


IB 


Error Term 

Availability Pattern 

Max 

f = 0.5 

Hi 

II 

O 

Average Proximal Effect 

0.10 

0.08 

0.06 

0.10 

0.08 

0.06 



22 

80.9 

80.0 

81.0 

78.7 

77.5 

80.7 


Pattern 1 

29 

78.4 

80.6 

77.8 

80.6 

78.7 

79.0 



36 

80.2 

80.0 

79.6 

79.4 

80.2 

77.0 



22 

80.3 

78.1 

78.8 

80.6 

79.6 

79.8 


Pattern 2 

29 

80.3 

79.1 

80.2 

77.4 

79.9 

79.9 

i.i.d. Normal 


36 

76.8 

79.3 

80.2 

78.5 

78.4 

80.0 


22 

83.5 

81.5 

77.7 

78.5 

81.3 

78.7 


Pattern 3 

29 

77.9 

79.1 

78.5 

77.8 

78.8 

79.0 



36 

77.3 

78.1 

79.8 

79.8 

79.9 

79.1 



22 

77.2 

79.7 

81.8 

80.2 

79.0 

78.8 


Pattern 4 

29 

80.1 

78.8 

80.3 

79.4 

80.6 

80.1 



36 

80.5 

79.4 

80.0 

78.9 

79.9 

78.1 



22 

80.4 

81.9 

81.0 

79.7 

79.4 

80.7 

i.i.d. t dist. 

Pattern 1 

29 

81.7 

82.2 

82.2 

79.1 

82.3 

77.3 



36 

80.8 

78.8 

79.5 

81.8 

81.6 

79.9 



22 

81.0 

81.6 

79.7 

77.2 

80.1 

80.2 

i.i.d. Exp. 

Pattern 1 

29 

80.6 

82.4 

80.3 

79.0 

79.8 

80.3 



36 

82.1 

79.8 

80.8 

79.8 

79.5 

80.3 



22 

78.5 

80.3 

78.5 

82.3 

79.8 

80.3 

AR(l),(/) = -0.6 

Pattern 1 

29 

78.7 

80.8 

80.0 

77.1 

79.5 

77.9 



36 

77.7 

80.3 

80.2 

78.2 

77.4 

83.6 



22 

77.9 

79.0 

79.6 

80.0 

77.8 

80.4 

AR(l),(/) = -0.3 

Pattern 1 

29 

77.9 

79.1 

80.0 

79.0 

78.0 

78.4 



36 

78.1 

81.2 

80.2 

80.7 

80.9 

78.4 



22 

80.2 

78.5 

80.8 

80.5 

79.6 

82.6 

AR(1), 0 = 0.3 

Pattern 1 

29 

78.0 

80.0 

80.0 

78.0 

79.4 

80.1 



36 

77.6 

82.5 

80.6 

77.0 

78.9 

82.0 



22 

80.4 

79.8 

79.5 

80.7 

79.5 

82.0 

AR(1), 0 = 0.6 

Pattern 1 

29 

78.9 

81.5 

79.3 

79.5 

81.3 

79.5 



36 

79.5 

78.4 

78.8 

80.1 

77.9 

77.8 



22 

79.9 

79.4 

80.0 

78.7 

79.2 

79.4 

AR(5),0 = -O.6 

Pattern 1 

29 

80.0 

78.3 

79.1 

76.8 

79.6 

79.3 



36 

80.5 

80.0 

79.2 

80.1 

78.0 

80.4 



22 

79.2 

80.4 

81.9 

81.3 

77.7 

79.1 

AR(5),0 = -O.3 

Pattern 1 

29 

80.0 

82.3 

80.5 

80.5 

82.2 

79.2 



36 

75.9 

78.7 

79.3 

79.0 

79.4 

79.9 



22 

79.4 

80.8 

79.8 

79.5 

77.3 

81.2 

AR(5), 0 = 0.3 

Pattern 1 

29 

78.0 

79.2 

79.2 

79.2 

80.5 

78.4 



36 

78.3 

79.1 

78.1 

80.7 

80.5 

79.5 



22 

80.2 

77.9 

80.3 

78.6 

78.4 

80.3 

AR(5), 0 = 0.6 

Pattern 1 

29 

76.9 

79.3 

80.2 

79.1 

80.6 

80.5 



36 

78.7 

84.0 

80.1 

78.8 

79.3 

78.8 


“Max”is the day in which the maximal proximal effect is attained, f - (1/ T] E[It] is the aver¬ 
age availability, (j) is the parameter for AR(1) and AR(5) process. Bold numbers are significantly(at 
.05 level) less than .80. 
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Table 4B: Simulated type 1 error rate(%) and power(%) when the duration of study is 4-week and 


8-week. Error terms follow i.i.d. N(0,1). The associated sample size is given in Table 

IB 





f = 0.5 


t = 

0.7 


Duration of Study 

Availability Pattern 

Max 

Average Proximal Effect 




0.10 

0.08 

0.06 

0.10 

0.08 

0.06 



15 

4.1 

4.7 

6.3 

5.3 

5.5 

5.6 


Pattern 1 

22 

5.2 

4.4 

4.7 

3.1 

4.7 

4.4 



29 

5.7 

5.5 

5.6 

4.3 

4.2 

4.2 



15 

4.8 

4.8 

5.0 

5.0 

5.2 

5.3 


Pattern 2 

22 

5.1 

5.2 

4.7 

3.7 

4.2 

3.7 

4-week 


29 

5.6 

5.1 

4.2 

4.2 

4.9 

4.4 


15 

4.7 

5.0 

4.6 

6.1 

5.3 

5.1 


Pattern 3 

22 

4.9 

4.0 

6.6 

4.2 

3.8 

4.1 



29 

4.7 

4.3 

5.1 

4.6 

5.8 

3.5 



15 

4.9 

4.6 

4.8 

3.0 

5.9 

3.8 


Pattern 4 

22 

3.5 

5.1 

4.5 

5.2 

3.8 

6.0 



29 

4.4 

6.4 

4.7 

4.4 

4.3 

4.7 



29 

4.1 

4.6 

4.0 

5.3 

5.0 

5.9 


Pattern 1 

36 

3.3 

4.7 

6.5 

4.6 

5.4 

4.3 



43 

3.2 

5.1 

5.2 

5.0 

3.4 

5.0 



29 

3.9 

5.0 

4.5 

4.2 

3.7 

4.1 


Pattern 2 

36 

3.8 

4.6 

4.9 

4.5 

3.4 

5.2 

8-week 


43 

3.9 

5.4 

5.0 

3.4 

3.8 

5.0 


29 

4.6 

4.2 

3.7 

5.2 

4.1 

4.0 


Pattern 3 

36 

4.3 

5.1 

6.1 

4.6 

5.0 

4.6 



43 

4.6 

6.0 

4.1 

5.0 

4.9 

4.0 



29 

4.5 

5.2 

2.9 

3.6 

5.3 

4.4 


Pattern 4 

36 

4.5 

5.2 

3.7 

2.7 

3.7 

4.7 



43 

4.2 

7.1 

4.9 

4.4 

4.5 

4.8 



15 

80.4 

79.0 

78.5 

79.6 

82.8 

80.3 


Pattern 1 

22 

78.8 

78.7 

80.7 

78.7 

79.2 

80.0 



29 

76.2 

80.6 

80.1 

81.3 

80.1 

79.1 



15 

82.4 

77.8 

77.2 

75.9 

80.0 

78.9 


Pattern 2 

22 

77.2 

80.3 

81.5 

75.8 

80.7 

82.0 

4 week 


29 

80.1 

79.3 

80.1 

78.0 

77.7 

76.9 


15 

79.3 

79.8 

79.2 

79.1 

76.5 

80.8 


Pattern 3 

22 

80.0 

80.0 

79.0 

79.0 

80.2 

81.8 



29 

79.4 

80.7 

79.3 

80.4 

79.6 

79.2 



15 

82.6 

78.3 

79.2 

80.5 

80.0 

79.5 


Pattern 4 

22 

80.4 

80.7 

79.3 

79.1 

78.5 

79.2 



29 

78.4 

79.2 

78.5 

79.6 

79.2 

80.5 



29 

79.7 

77.3 

76.4 

79.1 

82.2 

79.6 


Pattern 1 

36 

78.8 

78.6 

81.5 

80.3 

78.2 

79.6 



43 

80.4 

77.8 

78.7 

79.1 

80.3 

80.1 



29 

79.3 

81.1 

79.8 

78.7 

79.7 

80.2 


Pattern 2 

36 

81.2 

78.5 

79.0 

81.3 

80.8 

78.2 

8 week 


43 

80.3 

81.5 

77.5 

75.1 

78.8 

78.1 


29 

80.1 

79.0 

77.1 

78.2 

80.4 

78.8 


Pattern 3 

36 

79.5 

79.9 

79.6 

80.0 

80.8 

79.6 



43 

80.5 

79.5 

79.6 

79.4 

79.4 

80.2 



29 

82.1 

79.7 

80.7 

79.7 

79.0 

78.4 


Pattern 4 

36 

77.8 

78.2 

80.1 

77.9 

76.9 

79.5 



43 

79.6 

78.5 

78.1 

79.4 

80.6 

79.5 


“Max”is the day in which the maximal proximal effect is attained, x = (1 / T) X is the average 

availability. Bold numbers are significantly(at .05 level) greater than .05 and less than .80. 
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Table 5B: Simulated Type 1 error rate(%) and power(%) when the availability indicator, It depends on the recent past 
treatments with rj = -0.2. The expected availability is constant in t and equal to 0.5. Duration of study is 42 days. The 


associated sample size is given in Table 

IB 


Error 



f = 0.5 

t=0.7 

f = 0.5 

Hi 

II 

O 

Term 

0 

Max 

Average Proximal Effect 



0.10 

0.08 

0.06 

0.10 

0.08 

0.06 

0.10 

0.08 

0.06 

0.10 

0.08 

0.06 



22 

4.8 

5.4 

4.5 

3.4 

5.8 

3.7 

81.5 

78.0 

79.4 

81.7 

77.9 

80.7 


-0.6 

29 

4.7 

4.4 

4.2 

4.0 

4.9 

4.6 

79.4 

80.9 

80.7 

78.2 

79.2 

79.7 



36 

4.3 

5.3 

4.4 

4.2 

3.9 

5.5 

79.5 

81.5 

79.8 

80.2 

79.2 

80.7 



22 

4.7 

3.8 

4.4 

3.5 

4.4 

4.6 

78.7 

81.2 

80.3 

80.9 

77.9 

78.5 


-0.3 

29 

3.8 

4.0 

4.9 

3.5 

5.0 

4.4 

80.1 

79.5 

81.2 

77.3 

79.5 

77.1 

APn 1 


36 

2.7 

5.7 

4.0 

3.3 

4.7 

5.2 

76.8 

80.4 

79.9 

78.8 

79.5 

79.4 



22 

4.8 

4.1 

4.4 

5.0 

5.4 

3.6 

83.0 

79.8 

79.4 

81.3 

78.9 

79.2 


0.3 

29 

4.9 

4.6 

5.0 

4.4 

5.5 

5.6 

79.5 

80.3 

82.2 

78.5 

80.7 

77.6 



36 

4.9 

4.9 

4.2 

3.3 

4.5 

4.8 

80.0 

78.9 

79.5 

81.7 

79.4 

79.6 



22 

4.5 

5.1 

4.7 

4.3 

4.6 

4.0 

80.3 

78.9 

81.1 

81.2 

81.5 

77.9 


0.6 

29 

3.4 

4.5 

5.1 

4.4 

4.3 

4.6 

79.3 

76.2 

79.4 

81.3 

80.6 

79.4 



36 

4.8 

4.3 

4.2 

4.1 

4.5 

4.5 

77.5 

80.5 

80.9 

76.7 

80.0 

79.7 



22 

4.8 

4.6 

4.3 

3.7 

4.7 

3.5 

81.9 

81.4 

81.6 

79.8 

78.3 

78.9 


-0.6 

29 

6.5 

4.1 

4.5 

3.3 

4.5 

4.8 

77.5 

79.9 

79.8 

79.9 

79.3 

79.3 



36 

3.5 

5.7 

4.4 

4.6 

4.7 

5.7 

77.8 

80.8 

78.6 

77.9 

79.2 

81.7 



22 

4.3 

4.9 

4.0 

4.3 

5.6 

5.0 

77.7 

81.8 

80.0 

80.1 

80.3 

81.1 


-0.3 

29 

3.9 

4.0 

5.0 

3.2 

5.7 

5.1 

80.0 

80.9 

80.3 

80.6 

80.3 

77.8 

^R(5) 


36 

4.0 

3.6 

4.7 

4.8 

4.8 

3.2 

79.0 

80.4 

80.8 

80.1 

79.0 

76.5 



22 

3.5 

4.9 

5.0 

4.1 

3.8 

4.1 

77.4 

82.9 

78.5 

80.6 

81.4 

80.2 


0.3 

29 

4.6 

6.1 

4.7 

4.7 

4.1 

4.1 

78.7 

82.0 

78.0 

81.4 

76.5 

81.3 



36 

5.1 

4.4 

4.0 

3.2 

3.9 

4.7 

79.7 

81.8 

78.6 

79.1 

77.4 

79.0 



22 

5.0 

4.6 

4.3 

4.0 

4.0 

5.5 

80.5 

79.4 

82.5 

79.2 

81.1 

81.0 


0.6 

29 

5.6 

4.3 

6.9 

5.6 

3.4 

3.1 

78.3 

80.0 

80.5 

80.8 

80.4 

78.4 



36 

4.8 

4.8 

4.8 

3.5 

3.7 

5.5 

78.2 

80.5 

80.3 

77.6 

80.5 

79.1 


“Max’Ts the day in which the maximal proximal effect is attained, t = (1/ T) Elh] is the average availability, (p is 
the parameter for AR(1) and AR(5) process. Bold numbers are significantly(at .05 level) greater than .05 and less than 
.80. 


Table 6B: Simulated type 1 error rate(%) and power(%) when working assumption (a) is violated. Scenario 1. The 


average availability is 0.5. The day of maximal proximal effect is 29. 


e 

d 

Availability Pattern 

Pattern 1 

Pattern 2 

Pattern 3 

Pattern 4 

Pattern 1 

Pattern 2 

Pattern 3 

Pattern 4 


0.10 

5.5 

4.6 

4.2 

5.1 

79.7 

79.4 

80.5 

80.1 

0.5d 

0.08 

5.1 

4.4 

5.4 

4.6 

80.4 

78.9 

80.4 

78.7 


0.06 

4.1 

5.5 

4.6 

4.3 

77.5 

82.7 

81.0 

81.0 


0.10 

4.8 

4.3 

3.7 

4.1 

79.3 

78.3 

77.8 

79.4 

d 

0.08 

5.4 

4.9 

4.6 

5.5 

78.8 

79.3 

78.0 

80.6 


0.06 

4.4 

3.5 

5.1 

4.6 

78.4 

79.3 

79.0 

80.4 


0.10 

4.4 

4.1 

4.4 

4.8 

78.3 

80.5 

78.4 

79.9 

l.5d 

0.08 

5.0 

4.3 

4.3 

3.9 

80.5 

79.7 

78.7 

81.9 


0.06 

4.0 

5.1 

5.5 

5.6 

77.2 

80.8 

81.6 

80.3 


0.10 

4.1 

3.8 

5.0 

5.5 

77.7 

78.8 

79.0 

78.4 

2d 

0.08 

4.0 

5.0 

3.7 

5.7 

79.3 

81.5 

79.1 

79.4 


0.06 

4.9 

4.3 

5.2 

5.3 

80.8 

79.0 

77.5 

80.9 


d = {1/is the average proximal effect. 6 is the coefficient of Wt in E[Yt+i\It = 1]. Bold Numbers are 
significantly (at .05 level) greater than .05 (for type 1 error rate) and lower than 0.80(for power). 
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Figure 4: Conditional expectation of proximal response, E[Yt+i\lt - !]■ The horizontal axis is the decision time 
point. The vertical axis is E[Yt+i\It - 11- 


Table 7B: Simulated Type 1 error rate(%) and power (%) when working assumption (a) is violated. Scenario 2. 
The shapes of a (f) = E[Yt+i \It- 1] and patterns of availability are provided in Figurej^and Figure[^ The average 
availability is 0.5. The day of maximal proximal effect is 29. The associated sample size is given in Table|lB[_ 




Availability Pattern 

a{t) 

d 

Pattern 1 

Pattern 2 

Pattern 3 

Pattern 4 

Pattern 1 

Pattern 2 

Pattern 3 

Pattern 4 


0.10 

3.6 

4.3 

4.7 

4.5 

77.4 

80.2 

76.2 

75.9 

Shape 1 

0.08 

5.9 

3.8 

4.1 

3.4 

79.7 

80.1 

78.9 

80.6 


0.06 

4.6 

5.7 

4.2 

6.5 

78.7 

76.3 

78.3 

79.9 


0.10 

4.8 

4.8 

4.4 

4.1 

79.2 

79.1 

78.5 

79.7 

Shape 2 

0.08 

3.9 

5.4 

4.8 

4.3 

77.7 

80.4 

76.8 

80.9 


0.06 

5.1 

5.5 

3.4 

4.9 

78.3 

79.4 

79.8 

80.2 


0.10 

5.1 

3.5 

4.3 

4.4 

79.1 

79.4 

75.6 

78.0 

Shape 3 

0.08 

4.6 

5.0 

6.2 

3.8 

78.3 

78.1 

79.1 

78.1 


0.06 

4.8 

4.4 

5.4 

4.2 

78.0 

78.3 

79.8 

77.7 


d - (1/ T) Z[d is the average standardized treatment effect. Bold Numbers are significantly (at .05 level) greater 
than .05 (for type I error rate) and lower than 0.80(for power). 
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Figure 5: Proximal Main Effects of Treatment, {d[t)}^^^: representing maintained, slightly degraded and severely 
degraded time-varying treatment effects. The horizontal axis is the decision time point. The vertical axis is 
the standardized treatment effect. The "Max" in the title refers to the day of maximal effect. The average 
standardized proximal effect is 0.1 in all plots. 
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Table 8B: Sample Sizes when working assumption (b) is violated. The shape of the standardized proximal effect, 
d{t) - ;6(t)/(7 and pattern for availability, E[It\ are provided in Figurel^and in Figure 0. _ 


d 

Availability 

Pattern 

Max 

t = 0.5 

t = 0.7 

Shape of d[t) 

Maintained 

Slightly 

Degraded 

Severely 

Degraded 

Maintained 

Slightly 

Degraded 

Severely 

Degraded 



15 

43 

41 

39 

32 

31 

29 


Pattern 1 

22 

43 

41 

40 

33 

31 

30 



29 

38 

37 

38 

29 

28 

29 



15 

43 

41 

39 

33 

31 

30 


Pattern 2 

22 

43 

42 

40 

33 

31 

30 

0.10 


29 

38 

37 

38 

29 

28 

29 


15 

45 

43 

41 

33 

32 

31 


Pattern 3 

22 

44 

43 

42 

33 

32 

31 



29 

37 

38 

39 

28 

28 

29 



15 

42 

39 

37 

32 

30 

28 


Pattern 4 

22 

44 

41 

39 

33 

31 

30 



29 

39 

38 

38 

29 

28 

28 



15 

65 

61 

58 

48 

45 

43 


Pattern 1 

22 

65 

62 

60 

48 

46 

44 



29 

56 

55 

56 

42 

41 

42 



15 

65 

61 

59 

48 

45 

43 


Pattern 2 

22 

65 

62 

60 

48 

46 

44 

0.08 


29 

56 

55 

56 

42 

41 

42 


15 

67 

64 

62 

49 

47 

45 


Pattern 3 

22 

66 

64 

63 

48 

47 

46 



29 

56 

56 

59 

41 

41 

43 



15 

63 

59 

55 

47 

44 

41 


Pattern 4 

22 

65 

61 

58 

48 

45 

43 



29 

58 

56 

56 

43 

41 

41 



15 

111 

105 

100 

81 

76 

73 


Pattern 1 

22 

112 

106 

103 

81 

77 

75 



29 

96 

94 

96 

70 

69 

70 



15 

112 

105 

100 

81 

77 

73 


Pattern 2 

22 

112 

106 

103 

81 

77 

75 

0.06 


29 

96 

94 

96 

70 

68 

70 


15 

116 

111 

106 

83 

79 

76 


Pattern 3 

22 

114 

110 

108 

82 

79 

78 



29 

95 

96 

101 

69 

69 

72 



15 

108 

100 

94 

79 

74 

70 


Pattern 4 

22 

112 

105 

99 

81 

76 

73 



29 

100 

95 

95 

72 

69 

70 


“Max’Ts the day in which the maximal proximal effect is attained, d- (1/ T) Z[d is the average standard¬ 
ized treatment effect. 
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Table 9B: Simulated Power(%) when working assumption (b) is violated. The shape of the standardized 
proximal effect, d{t) = P{t]/d and pattern for availability, EUt] are provided in Figurej^and in Figure {^. The 
corresponding sample sizes are given in Table 8B 


d 

Availability 

Pattern 

Max 

t = 0.5 

t = 0.7 

Shape of d[t) 

Maintained 

Slightly 

Degraded 

Severely 

Degraded 

Maintained 

Slightly 

Degraded 

Severely 

Degraded 



15 

78.4 

78.8 

78.6 

79.1 

80.1 

77.6 


Pattern 1 

22 

80.4 

79.5 

81.2 

80.0 

76.9 

77.9 



29 

80.4 

79.2 

78.9 

77.3 

76.8 

81.1 



15 

78.6 

79.9 

79.9 

80.1 

80.4 

81.3 


Pattern 2 

22 

78.3 

81.2 

78.8 

79.2 

80.8 

80.5 

0.10 


29 

77.9 

80.8 

79.3 

78.1 

77.7 

82.2 


15 

81.0 

79.7 

77.4 

77.9 

80.9 

77.6 


Pattern 3 

22 

78.9 

79.1 

80.0 

79.7 

79.4 

75.9 



29 

80.9 

77.5 

77.7 

80.6 

79.2 

78.5 



15 

79.7 

79.5 

77.9 

79.5 

81.7 

78.0 


Pattern 4 

22 

78.9 

77.9 

80.4 

82.2 

78.9 

78.8 



29 

77.9 

79.7 

79.0 

78.0 

80.2 

80.8 



15 

80.5 

79.5 

78.6 

80.6 

79.2 

78.7 


Pattern 1 

22 

78.9 

78.7 

78.8 

78.9 

80.7 

80.3 



29 

76.6 

78.0 

78.3 

80.9 

78.6 

80.4 



15 

81.0 

79.3 

78.7 

82.0 

80.5 

80.1 


Pattern 2 

22 

82.4 

80.6 

80.0 

78.0 

79.6 

79.4 

0.08 


29 

79.2 

76.9 

81.9 

78.3 

78.8 

79.7 


15 

78.2 

81.6 

80.9 

79.1 

79.2 

77.5 


Pattern 3 

22 

80.9 

79.5 

78.6 

79.2 

78.3 

81.4 



29 

80.4 

79.3 

77.5 

77.9 

80.2 

82.3 



15 

79.4 

79.4 

78.1 

78.6 

77.4 

78.8 


Pattern 4 

22 

81.3 

78.4 

78.4 

80.6 

79.4 

80.4 



29 

79.9 

79.3 

79.8 

79.5 

79.7 

81.2 



15 

81.2 

80.5 

79.0 

77.8 

78.7 

79.6 


Pattern 1 

22 

80.0 

81.7 

79.8 

80.7 

80.5 

80.2 



29 

81.2 

78.7 

79.2 

81.2 

79.7 

80.1 



15 

78.7 

77.5 

81.4 

80.7 

81.0 

80.7 


Pattern 2 

22 

80.6 

81.8 

79.2 

80.3 

81.6 

80.2 

0.06 


29 

78.5 

80.2 

80.0 

77.7 

78.1 

78.0 


15 

78.1 

80.0 

80.9 

79.7 

79.3 

78.8 


Pattern 3 

22 

81.2 

80.2 

80.0 

78.3 

82.2 

81.1 



29 

79.6 

81.6 

79.8 

80.2 

81.6 

76.9 



15 

78.2 

79.8 

78.9 

79.5 

77.3 

79.2 


Pattern 4 

22 

79.2 

81.1 

79.4 

76.8 

79.2 

80.4 



29 

79.9 

78.5 

79.8 

80.1 

78.9 

81.8 


“Max”is the day in which the maximal proximal effect is attained, d = (1/ T) -Z'd is the average standard¬ 
ized treatment effect. Bold numbers are signihcantly (at .05 level) lower than .80. 
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Table lOB: Simulated Type 1 error rate(%) and power(%) when working assumption (c) is violated. 
The trends of at are provided in Figure]^ The standardized average effect is 0.1. E[It\ - 0.5. The 
associated sample sizes are 41 and 42 when the day of maximal effect is 22 and 29. 


(^inAR(l) 

gif 

gOf 

const. 

Max 

trend 1 

= 22 

trend 2 

trend 3 

const. 

Max 

trend 1 

= 29 

trend 2 

trend 3 


0.8 

4.1 

4.3 

3.3 

5.4 

4.7 

4.9 

2.8 

4.1 

-0.6 

1.0 

4.6 

5.0 

4.0 

4.4 

4.4 

4.8 

4.2 

4.3 


1.2 

3.8 

4.5 

5.2 

5.5 

4.3 

4.1 

4.5 

3.8 


0.8 

5.2 

4.7 

4.0 

3.4 

5.4 

4.9 

6.2 

4.5 

-0.3 

1.0 

4.9 

4.5 

4.5 

4.3 

5.2 

5.1 

4.0 

3.7 


1.2 

5.4 

4.6 

4.1 

3.8 

3.7 

5.2 

4.3 

5.0 


0.8 

4.8 

4.0 

4.1 

3.9 

4.7 

5.2 

3.7 

4.2 

0 

1.0 

5.4 

4.0 

5.8 

3.9 

4.1 

4.0 

5.9 

5.7 


1.2 

4.4 

4.9 

5.0 

4.6 

3.7 

4.8 

4.4 

4.9 


0.8 

5.3 

4.4 

4.7 

3.2 

4.6 

5.4 

5.6 

4.1 

0.3 

1.0 

5.5 

4.0 

3.4 

3.7 

5.0 

4.6 

4.0 

3.6 


1.2 

3.8 

4.5 

4.5 

4.8 

4.5 

5.0 

6.2 

4.3 


0.8 

5.5 

3.9 

5.3 

3.8 

3.3 

3.5 

5.1 

4.2 

0.6 

1.0 

4.0 

3.7 

5.2 

5.1 

4.8 

5.1 

5.0 

4.7 


1.2 

4.5 

5.1 

4.6 

4.9 

4.5 

4.4 

4.7 

4.8 


0.8 

82.8 

82.7 

83.7 

79.9 

83.6 

80.6 

88.7 

79.2 

-0.6 

1.0 

81.1 

79.1 

79.9 

74.8 

77.7 

74.3 

84.8 

70.4 


1.2 

76.6 

76.3 

76.3 

70.6 

77.6 

72.0 

80.7 

70.4 


0.8 

83.0 

83.0 

86.0 

80.3 

82.7 

79.2 

87.9 

78.0 

-0.3 

1.0 

77.6 

81.4 

80.7 

74.9 

79.1 

74.5 

86.0 

73.7 


1.2 

78.2 

76.9 

77.3 

73.4 

74.4 

71.2 

81.0 

70.7 


0.8 

84.6 

84.6 

82.1 

79.0 

81.8 

81.5 

88.0 

78.0 

0 

1.0 

80.1 

78.6 

80.9 

73.6 

77.7 

76.5 

86.1 

71.8 


1.2 

76.0 

76.7 

77.4 

70.6 

74.5 

69.9 

83.4 

69.6 


0.8 

83.6 

79.7 

84.6 

79.7 

82.1 

81.7 

88.2 

75.7 

0.3 

1.0 

81.5 

82.4 

82.3 

73.9 

79.5 

74.6 

85.1 

71.5 


1.2 

74.8 

76.6 

78.2 

71.1 

75.5 

71.1 

82.5 

70.1 


0.8 

81.4 

83.1 

83.5 

80.5 

83.1 

77.1 

86.6 

76.9 

0.6 

1.0 

80.7 

76.4 

79.0 

74.8 

80.4 

73.4 

84.7 

76.8 


1.2 

77.0 

77.5 

77.0 

73.5 

74.4 

72.5 

81.6 

69.4 


(p is the parameter in AR(1) process for {et}J^y. Bold numbers are significantly(at .05 level) greater 
than .05 and lower than .80. 


Table IIB: Simulated Type 1 error rate(%) when work¬ 
ing assumption (d) is violated. E[It] - 0.5. The average 
effect is 0.1 and day of maximal effect is 29. N = 42. 


Parameters in It 

72 

71 

-0.1 

-0.2 

-0.3 


-0.2 

5.7 

3.2 

3.9 

771 = - 0 . 1,772 = -0.1 

-0.5 

3.2 

4.2 

4.9 


-0.8 

4.2 

5.1 

5.5 


-0.2 

5.4 

3.8 

3.9 

771 = - 0 . 2,772 = -0.1 

-0.5 

4.4 

4.4 

4.8 


-0.8 

4.7 

4.3 

4.6 


-0.2 

4.5 

5.0 

5.0 

771 = - 0 . 1,772 = -0.2 

-0.5 

4.9 

3.8 

6.0 


-0.8 

4.7 

4.8 

4.8 


771,772 are parameters in generating It- ji, 72 are coef¬ 
ficients in the model of Yf+i. Bold Numbers are signifi¬ 
cantly (at .05 level) greater than .05. 
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Table 12B; Degradation in power when average proximal effect is underesti¬ 
mated. Day of maximal effect is 29 and the average availability is 0.5. 


d in Sample 

True d 

Availability Pattern 

Size Formula 

Pattern 1 

Pattern 2 

Pattern 3 

Pattern 4 


0.098 

76.2 

78.9 

77.6 

78.6 


0.096 

75.1 

74.6 

78.8 

74.0 


0.094 

73.7 

70.7 

75.4 

73.4 


0.092 

71.5 

71.6 

73.2 

71.6 


0.090 

68.9 

68.4 

69.6 

67.3 

0.10 (N = 42) 

0.088 

65.4 

65.6 

66.1 

65.7 


0.086 

66.4 

67.9 

65.2 

66.7 


0.084 

62.3 

63.4 

63.0 

59.6 


0.082 

60.0 

60.2 

60.5 

58.2 


0.080 

58.9 

59.8 

57.8 

61.4 


0.078 

78.2 

80.2 

76.8 

75.8 


0.076 

77.3 

76.7 

76.2 

75.4 


0.074 

73.1 

72.2 

71.2 

71.4 


0.072 

70.7 

71.0 

69.4 

68.2 


0.070 

68.2 

66.0 

65.2 

66.1 

0.08(N = 64) 

0.068 

65.5 

64.3 

64.6 

65.7 


0.066 

62.8 

62.3 

61.8 

59.4 


0.064 

61.9 

58.5 

59.5 

62.1 


0.062 

53.9 

52.6 

57.0 

56.9 


0.060 

54.6 

51.1 

54.8 

53.4 


0.058 

75.6 

76.9 

74.0 

78.1 


0.056 

73.9 

73.1 

73.1 

72.7 


0.054 

68.6 

71.1 

69.3 

68.5 


0.052 

65.4 

69.4 

63.6 

66.8 


0.050 

61.0 

62.8 

64.1 

63.2 

0.06(N = 109) 

0.048 

57.4 

58.6 

56.4 

56.1 


0.046 

53.6 

53.4 

52.9 

54.8 


0.044 

52.0 

48.9 

50.1 

53.0 


0.042 

45.7 

43.9 

44.9 

46.4 


0.040 

40.4 

42.2 

42.3 

42.7 


Table 13B: Degradation in Power when average availability is underestimated. The day of 
maximal treatment effect is attained at day 29 and the average proximal main effect is 0.1. 


(l/TlZf^itf in 

True 

Availability Pattern 

Sample Size Formula 


Pattern 1 

Pattern 2 

Pattern 3 

Pattern 4 


0.048 

76.4 

81.7 

76.0 

78.2 


0.046 

73.9 

75.5 

73.6 

75.8 


0.044 

70.6 

72.1 

71.0 

71.7 


0.042 

70.8 

70.6 

74.2 

70.3 


0.040 

70.3 

69.2 

65.7 

68.6 

0.5 (N = 42) 

0.038 

66.0 

66.8 

67.8 

67.0 


0.036 

64.0 

62.5 

62.4 

62.9 


0.034 

60.8 

61.3 

59.4 

63.9 


0.032 

56.4 

59.2 

54.7 

59.8 


0.030 

51.4 

53.1 

51.9 

54.5 


0.068 

79.5 

76.1 

79.1 

75.0 


0.066 

77.3 

75.7 

74.0 

76.4 


0.064 

74.5 

74.7 

73.5 

77.1 


0.062 

73.2 

73.0 

75.1 

72.5 


0.060 

69.8 

70.5 

73.5 

72.5 

0.7 (N = 32) 

0.058 

71.0 

69.6 

71.3 

67.3 


0.056 

68.8 

70.3 

66.6 

64.0 


0.054 

68.1 

65.8 

65.3 

68.6 


0.052 

62.4 

64.9 

65.6 

62.9 


0.050 

60.6 

63.3 

62.8 

61.4 
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